The checking subsystem also makes use of a User dictionary. A User dictionary is a collection of user-specific elements, the so-called UDitems. UDitems can be of two types: literal strings (usually words, as in the case of any word processor's user dictionary) or regular expressions. A string being checked will be accepted if it conforms to at least one item in the specified section of the User dictionary. A regular expression defines a pattern, range, or class of characters, either singly or as a group. When an item is a regular expression, it means that during the UD checking, strings passed for checking by a recognition module will be checked to see whether they conform to the pattern defined by the regular expression.
In the following example, regular expressions will be applied to check whether the recognized strings comply with post or zip code formats used mostly in Europe or in the US.
Adding Literal and Regular Expression UDitems to the User Dictionary
C | Copy Code |
---|---|
AT_ERRCOUNT nErrCount; LPCSTR sect1 = "ZIP_Section"; LPCWSTR item_literal = L"Accusoft"; // US postal zip code: 12345 or 12345-67890 LPCWSTR US_postal_zip = L"\\d{5}(-\\d{5})?"; // European postal code: D-12345 or H-1234 LPCWSTR European_postal_zip = L"[A-Z]-\\d{4,5}"; // Open user dictionary nErrCount = IG_REC_UD_edit_open(); // 0 as a third parameter denotes a literal string // and 1 denotes a regular expression nErrCount = IG_REC_UD_item_add(sect1, item_literal, 0); nErrCount = IG_REC_UD_item_add(sect1, US_postal_zip, 1); nErrCount = IG_REC_UD_item_add(sect1, European_postal_zip, 1); // Close user dictionary nErrCount = IG_REC_UD_edit_close(); |
Within the User Dictionary, the UDitems can be organized under different sections. Zones are always associated with a section of the User dictionary when they are created.
There can be different situations when it is worth doing an UD checking.
If the application uses spell checking and it consistently encounters words marked as uncertain that are spelled correctly, or it is known that the document contains many proper nouns, the application can reduce unwanted marking and improve recognition accuracy by performing UD checking, to supplement the spell checking (assuming that the User Dictionary has been prepared previously by adding the required words to it). In this case the UD checking is complementary to the spell checking.
UD checking without spell checking enabled is typically used in form-like applications where the data to be recognized is highly structured and follows predictable patterns (e.g., questionnaires).
Specifying the User dictionary file itself is a page-level setting. Once it is specified, it will be applied to all zones on the page. However, since the User dictionary may have several sections, each to be assigned to the different zones, different sets of dictionary items can be used for the different zones. For particular zones, the UD-checking can be disabled with the IG_REC_ZCF_USERDICT_PROHIBIT flag.